min rank | avg. rank | sentence |
---|---|---|
2612 | 30062.8333 | E. s. erlangeri, mannetjie, Serengeti, Tanzanië. |
2009 | 33334.6667 | Tipiese kasteel internetrappe (Caernarfon Kasteel, Wallis ). |
1919 | 55963.4000 | Lêer:Rembrandt - Klesveverlaugets forstandere i Amsterdam. |
1769 | 26890.4000 | D., Uitschot 1990-1998, Vryheid 1998-2004, professor Teologiese Skool Potchefstroom 2005-†2016-01-20. |
1769 | 21350.8462 | W.J. Snyman, Lindley 1925-1927, Venterstad 1927-1944, Krokodilrivier 1944-1946, professor Teologiese Skool Potchefstroom 1946-1970. |
1505 | 6570.7000 | Durban: Butterworth & Kie (SA) (Edms.) Bpk. * Potgieter, D.J. (ed.) 1972. |
1492 | 21446.3333 | Demografie * Tale (2001-sensus): Oekraïens (95,3%), Russies (3,8%), Pools (0,4%). |
1360 | 23454.0000 | D., D.Litt. (Honoris Causa), Frankfort 1981-1983, professor Teologiese Skool Potchefstroom 1983. |
1291 | 4647.6667 | Plekke: huise, koninklike howe, teaters, paleise. |
1197 | 69724.8889 | Dus: :Llandudno, Ffestiniog, Rhuthun, ens. (plekname) :Llŷr, Rhian, ens (persoonlike eiename) :Rhedeg busnes dw i. Llyfrgellydd ydy hi. |
812 | 6857.2857 | Plekke: velde, fonteine, baddens, hawens, verlate plekke. |
738 | 25233.1765 | D., Laeveld 1965-1968, Koster 1968-1972, Meyerspark 1972-1981, Stellenbosch Strand 1981-1982, Stellenbosch 1982-1985, professor Teologiese Skool Potchefstroom 1985-2005. |
738 | 32011.7000 | M., Drs. Theol., Steynsburg 1967-1971, Paarl Stellenbosch 1971-1975, Groblersdal 1975-2002. |
738 | 25283.5000 | Stellenbosch: Stellenbosch Museum * Kesting, DP. 1978. |
736 | 3722.8750 | Hul totale jaarlikse begroting beloop ruim 6,3 miljard €. |
610 | 19572.5000 | D., Alberton-Wes 1979-1982, Brackenhurst 1982-1996, Brackenhof 1996-2000, professor Teologiese Skool Potchefstroom 2000. |
605 | 22014.2500 | Openbare geboue Lêer:Bedford munisipaliteit. |
585 | 10837.0000 | 166 pp (verkorte weergawe, oorspronklik 188 pp.). |
579 | 2688.0000 | Johannesburg, Witwatersrand University Press. |
553 | 21183.2000 | Londen 1941: Oxford University Press * Merle Lipton: Capitalism and Apartheid. |
552 | 10493.0000 | Inflasie vereenvoudig Inflasie vernietig altyd waarde. |
545 | 10537.8571 | Sterftes * Hieronymus Bosch (* ca. 1450) -Nederlandse skilder. |
495 | 3728.6667 | Vier privaat farmaseutiese maatskappye vervaardig medisyne. |
488 | 14845.7778 | Sommige bronne (Sien verwysing What Caused the Iron Age? |
481 | 31100.4000 | D., Komatipoort 1990-1995, Springs 1995-1999, Komatipoort 1999-2007, professor Teologiese Skool Potchefstroom 2007-2012, Studentedekaan NWU 2012. |
481 | 39381.3333 | D., Vereeniging '95 2005-2008, Zambië 2008-2011 (sendingopdrag), Pretoria-Annlin 2012. |
436 | 9563.8000 | Gebeure * Geboortes * Lucanus, Romeinse digter. |
436 | 7613.2857 | Gebeure * Geboortes * Marcus Aurelius Probus, Romeinse keiser. |
426 | 22425.2000 | Eerste uitgawe, Southern Boekuitgewers: Halfweghuis. |
418 | 22322.8462 | D., Colesberg Philipstown 1990-1998, Phalaborwa 1998-2005, Pietersburg-Suid 2005-2010, professor Teologiese Skool Potchefstroom 2011. |
In contrast to subsection 4.5.2.1 we now search for sentences consisting of rare words only. The sentences are ordered by the rank of the most frequent word in a sentence. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The sentences are forced not to contain any everyday word. As a consequence, we get either sentences of some very reduced structure or sentences in some foreign language. Hence, the data are useful for the evaluation of the preprocessing, especially language detection.
select min(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m desc limit 30;
Should we remove the sentences having its least frequent word above some threshold?
4.5.2.1 Maximum word rank in sentence
4.5.2.2 Average word rank in sentence
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II